Back - Trigger

Learning Articles - Database Management

Lacrimae rerum. Memento mori. Memento vivere.

Structured Query Language With PostgreSQL

Structured Query Language (SQL) is a standard language described by ISO/IEC 9075 for creating and working with databases stored in a set of tables. The actual implementations through database management systems use this standard language and usually add their own extensions for additional specifications. The most common implementations include PostgreSQL, MySQL, Microsoft SQL Server, Oracle Database, and SQLite, where PostgreSQL will be used as a reference, since it is a popular, versatile, and open source database (although most core functionality is shared). These notes rely on the ideas and learnings from the respective package documentations and "SQL For Data Analysis: Advanced Techniques For Transforming Data Into Insights", 1st Edition, by Cathy Tanimura in 2021.

Background To Databases

A database is an efficient and organized collection of information from which data can be extracted. The data is often stored in tables with rows and columns, although there are various other designs, schemes, and formats for representation. It is also possible for multiple users to conveniently access the data for consistency, while providing the ability to assign permissions and authority for security. A distinction can be made between 1-tier architecture, where the database and application client are running on the same system; 2-tier architecture, where the database can be accessed from multiple application clients running on different systems; and 3-tier architecture, where the database is accessed by a server which then interfaces with application clients running on different systems (increases security, scalability, liability, and development). The most ubiquitous types of databases include navigation, hierarchical, network, relational, and non-relational. Depending on the complexity of the database, it may be stored on a single local centralized file system or cluster of distributed computers in the cloud.

Demonstration of implementations for the 1-tier architecture, 2-tier architecture, and 3-tier architecture:

The early development of databases began with navigation, hierarchical, and network databases, as general-purpose systems to procedurally access data. These approaches offered application clients the ability to navigate around a coupled set of data which formed into a large network with sequential paths following tree-like orders (always following references from other objects). This network consisted of the linked records with collections of fields which then contained single values. There are some differences in the specifications, but, there will generally be rules which mandate the relationships from parent records to child records. Although simple, this often leads to inflexibility and constraints, such as needing to traverse from the root node to retrieve data (lack of arbitrary navigational routes). There are some implementations which are still available, but these databases are rarely used for production operations.

A relational database is a structured mechanism used to store data in a series of tables, where the tables are connected or have a relationship with each other in the style of a ledger. Conventionally, these tables can be seen to represent various types of entities with rows representing instances of individual records and columns describing the attributes of those records. A primary key is a unique identifier used to distinguish each record within an individual table. This primary key can then be shared as a foreign key and function as a common identifier between tables for related records. Thus, it is possible to extract data from multiple tables through cross-referencing directly in relation to the relevant keys of the records. As a result, the management of the database is systematic and methodical with a consistent structure and facilities for data normalization, while also allowing for efficient backup and recovery.

A non-relational database can be used to overcome the inconvenience of object-relational impedance mismatch when using domain-driven object models. Examples of common non-relational databases include key-value databases, where unique keys are tied to associated values; column-store databases, where data is only stored in columns and optimized for performance; graph databases, where the entities in a database are graphically illustrated with their linked connections; and document-store databases, where data is stored in documents grouped into collections which can be retrieved. The advantages of non-relational databases are improved flexibility, high scalability (as horizontal scaling through more workers, rather than vertical scaling through a more computationally powerful system), and cost-effectiveness. These are sometimes referred to as NoSQL or non-SQL databases.

Types of databases with relational, key-value, column-store, graph, document-store, and online analytical processing:

SQL And PostgreSQL

datastore?

In general, a database consists of schemas which provide organization and structure by describing the way in which the data in a schema may relate to other components and elements of the database. Within a schema, the objects most commonly used are tables, views, and functions. As mentioned, tables contain ...fields... and have indexes (usually for the primary key or commonly used columns), which is a special kind of data structure allowing data to be retrieved more efficiently. Views are virtual tables which allow data to be presented in an alternate format without modifying or altering the underlying tables (does not actually store data, so they could be thought of as stored queries). In creating a view, it is possible to employ a set of operations (based on the mathematical system of relational calculus) to define, filter, aggregate, or transform the data extracted from the underlying tables. Functions allow commonly used sets of calculations or procedures to be stored and easily referenced in queries. A database will often have an associated data dictionary, as a document or repository which gives a clear description of the attributes of the data.

Example of the typical organization and structure of a simple database:

SQL was designed to offer a set-based and declarative mechanism for interacting with a database management system with a scope extending to data query, data definition, data control, and data manipulation. It can be sub-divided into several language elements, including clauses, expressions, predicates, queries, and statements. For interest, SQL was initially developed at IBM by Donald Chamberlin and Raymond Boyce after ...learning about... the relational model from Edgar Codd in the early 1970s (originally called Structured English QUEry Language (SEQUEL)). SQL became a standard of the American National Standards Institute (ANSI) in 1986 and International Organization For Standardization (ISO) in 1987. As mentioned, actual implementations through database management systems use this standard language and usually add their own extensions for additional specifications. It should be noted that the keywords in commands are not case-sensitive (and whitespace is insignificant), but it is convention to use uppercase to make it easier to distinguish between keywords, database, table, and column names (database, table, and column names are case-sensitive).

The mentioned sub-languages of SQL for communicating a database include data query, data definition, data control, and data manipulation. The data query is associated with extracting data from a database through indexing and selection from multiple schemas, tables, and views (common commands include SELECT, FROM, and WHERE). The data definition is associated with creating, modifying, and deleting tables, views, and other objects in the database, where the structure is defined without affecting the content (common commands include CREATE, ALTER, and DROP). The data control is associated with accessing the database by assigning or removing permissions for users (common commands include GRANT and REVOKE). The data manipulation is associated with adding, updating, and deleting the content of the data (common commands include INSERT, UPDATE, and DELETE).

... :

PostgreSQL (originally called POSTGRES) is an object-relational database management system. It features transactions with atomicity consistency isolation durability properties (retains integrity while performing transactions), complex queries, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is designed to emphasize extensibility and allow for the modification and creation of new data types, functions, operators, index methods, and procedural languages (including direct support for Perl, Tcl, and Python with external support for many other languages). Since its initial release in 1996-07, it has established a reputation for reliability, robustness, security, and performance. The official documentation is a useful resource for reference of information and tutorials.

Each of the columns in a table of a database will have an associated data type for the ...associated... rows. The common and supported data types in PostgreSQL include string as char (fixed length), varchar (variable length), text (longer variable length); numerical as int (with versions for precision), float (with versions for precision), and decimal (specification of decimal places); logical as boolean (true or false); and datetime as date (year, month, and day), time (hours, minutes, and seconds), and timestamp (date and time). Other useful data types include json, xml, and bytea ... .

Initial Installation

There are no unusual prerequisites to install PostgreSQL, apart from the minimum hardware requirements for the system. PostgreSQL can usually be installed through a package manager, as conventionally performed through the native package manager of a Linux distribution (although this version may not be officially maintained). Alternatively, the official repository link can be added to the native package manager to fetch the official version of PostgreSQL through manual configuration. ...Once installed, the PostgreSQL server is started automatically and enabled by default to start at system reboot. The status of the PostgreSQL server can be checked with systemd commands...

Manually configure the target repository maintained by the PostgreSQL Global Development Group:
			~ $ sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
			~ $ wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
Install PostgreSQL using Apt package manager with the default or configured repositories:
~ $ sudo apt install postgresql

Creating And Deleting Databases

A database can be created with an assigned name. ... and requires the create privilege on the database. Additional options can be supplied for the creation of the database. These options include specifying the database character set (default as collate), .

It is also possible to drop all the tables in a specific database and delete the database completely. This should be used with caution and requires a drop privilege on the database. When deleting a database, the database directory is removed along with files associated with the database which were created by the implementation during operation (including .BAK, .DAT, .HSH, .MRG, .MYI, .cfg, .db, .ibd, and .ndb). This will not remove temporary tables which were created in the database beforehand, as these will only be automatically removed when the session which created them ends.

Create a database with a given name:
			CREATE {DATABASE | SCHEMA} [IF NOT EXISTS] database_name
				[create_option] ...
			
			create_option: [DEFAULT] {
				CHARACTER SET [=] charset_name
			| COLLATE [=] collation_name
			| ENCRYPTION [=] {'Y' | 'N'}
			}
Delete a database with a given name:
			DROP {DATABASE | SCHEMA} [IF EXISTS] database_name

Database privileges...

...Querying...

The SELECT clause determines which columns, specified as expressions, will be returned by the query. These expressions may also include manipulations, such as aggregating data, ...choosing... different cases, or converting data types. The FROM clause determines the sources from which the expressions in the SELECT clause are derived. These sources may be tables, views, or sub-queries (if multiple sources are specified, it is also necessary to include a JOIN operation). An alias can also be assigned to the sources for convenient referencing in other clauses in the query. The WHERE clause specifies restrictions or filters which are used to exclude rows from the results. When exploring collections of large data, the LIMIT clause can be used to restrict the number of rows included in the results (synonymous with TOP for some implementations of database management systems).

With regard to JOIN operations, an INNER JOIN will return all of the rows which match in both sources and OUTER JOIN will return all of the rows in both tables regardless of if they match (with NULL if for keys of unmatched rows). This operation can also be specified to have a type as LEFT JOIN to return all of the rows from the first source but only matching rows from the second source or RIGHT JOIN to return all of the rows from the second source but only matching rows from the first source. For ..., a CROSS JOIN ... . A Cartesian join will result when each row in a source matches multiple rows in the other source (although this should generally be avoided).

https://www.postgresql.org/docs/current/sql-select.html [ WITH [ RECURSIVE ] with_query [, ...] ] SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ] [ * | expression [ [ AS ] output_name ] [, ...] ] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ] [ HAVING condition ] [ WINDOW window_name AS ( window_definition ) [, ...] ] [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ] [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] https://www.postgresql.org/docs/current/queries.html https://www.postgresql.org/docs/current/queries-select-lists.html https://www.postgresql.org/docs/current/queries-table-expressions.html Example of a simple query with selection, joining, filtering, and limiting: SELECT Table1.ColumnA Table1.ColumnB Table2.ColumnC FROM dbo. ... AS Table1 dbo. ... AS Table2 LEFT OUTER JOIN Table2 ON Table1.ColumnA = Table2.ColumnA WHERE ... LIMIT ...

...Manipulating...

Aggregation. When aggregating data, it is also necessary to include a GROUP BY clause to specify ... (can use columns or positional notation for referencing).

https://www.postgresql.org/files/documentation/pdf/16/postgresql-16-A4.pdf